ViLD: Open-vocabulary Object Detection via Vision and Language Knowledge Distillation - 🍣YuWd(和田唯我)のメモ🍣

ViLD: Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

https://gyazo.com/9d44365d0e327c8592689a058e26f5e1

Open-Vocabulary (任意テキスト入力)な物体検出モデル

https://gyazo.com/fe3947729ba687b0b4fbfb80008565a0

classifierがCLIP特徴量になっている